Semantic Segmentation

Abstract

This project creates, trains, and tests a CNN for semantic segmentation following the version of FCN32s and FCN16s described in this CVPR Paper. However, contrary to this paper I replace the VGG-16 backbone model with a pre-trained ResNet-18 model.

Dataset

KITTI Dataset was used. The dataset comprised of 200 images along with their ground truth mask which was further split into 70-15-15 ratio for train, validation, test dataset respectively. It has 35 different class categories such as building, wall, traffic light, car, bus etc. Ground truth mask assigns class label to each pixel.

Architecture

Ref: ResNet paper

The above Architecture was used with following modifications to convert it to Fully Convolutional Network (FCN):

1) Replaced the average pool layer by an average pooling layer (referred to as “avgpool”) with kernel size of 7 x 7 with stride as 1 and no padding.

2) Added a new convolutional layer with a kernel size of 1 x 1. The number of kernels is (number of classes + 1), including a background class; this layer computes the probabilities of each class over its spatial extent.

3) Added a transpose convolution layer with stride=32 and kernel size as 64, that up-samples the classifier tensor back to the input image size.

4) For FCN-16, we additionally use the previous layer (output of final layer of conv4 x) output and combine it with the upsampled feature from avgpool.

Evaluation Metric

1) Pixel-level intersection-over-union (IoU): Pixel-level IoU = TP/(TP+FP+FN), where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively. Pixel-level IoU is computed on each class separately (treating other classes as negative).

2) Mean Intersection-over-Union (mIoU): A Simple average of per-class pixel-level IoUs, it reflects the model’s generality on all classes.

Results

Achieved mIoU score of 0.4 and 0.34 for FCN16, FCN32 respectively. As one can notice, FCN16 segments finer details leveraging the features of previous layer.

Project information

  • Language: Python
  • Framework: Pytorch
  • IDE: Google Colab
  • Architecture: ResNET18-FCN16/FCN32
  • Project URL: Semantic-segmentation